AD-A227  992 


C  FILE  GOBY 


fyto  36-t*A 


LIKELIHOOD  RATIO  SENSITIVITY  ANALYSIS  FOR 
MARKOVIAN  MODELS  OF  HIGHLY  DEPENDABLE  SYSTEMS 


Marviii'  K.  Nakayama.  Ambuj  Goyal  and  Peter  W.  Glynn 


TECHNICAL  REPORT  No.  54 


January  1990 


Prepared  under  the  Auspices 
of 

U.S.  Army  Research  Contract 
DAAL-03-SS-K-0063 


S 


ELECTE 

0CT241990 


Approved  for  public  release:  distribution  unlimited. 


Reproduction  in  whole  or  in  part  is  permitted  for  any 
purpose  of  the  United  States  government. 


DEPARTMENT  OF  OPERATIONS  RESEARCH 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA 


W  /j  003 


Likelihood  Ratio  Sensitivity  Analysis  for 
Markovian  Models  of  Highly  Dependable  Systems 


Marvin  K.  Nakayama1 
Department  of  Operations  Research 
Stanford  University 
Stanford,  California  94305 

Ambuj  Goyal 

IBM  Thomas  J.  Watson  Research  Center 
P.O.  Box  704 

York  town  Heights,  New  York  10598 
Peter  W.  Glynn2 

Department  of  Operations  Research 
Stanford  University 
Stanford,  California  94305 


Abstract 


1 

V. 

\ 

,  O  £  V  1 

v^-  $  y 

|  Accession  For  j 

NTIS 

DTIC 

Unarm 

Justi 

GRAftI  jVf 

TAB  43 

ounced  □ 

fication_  .... 

Bv  _ 

Distribution/ 

Availability  Code? 

Dist 

/H 

Avail 

Spec 

and/or 

lal 

This  paper  discusses  the  application  of  the  likelihood  ratio  gradient  estimator  lo 
simulations  of  large  Markovian  models  of  highly  dependable  systems.  Extensive 
empirical  work,  as  well  as  some  mathematical  analysis  of  small  dependabili¬ 
ty  models,  suggests  that  (in  this  model  setting)  the  gradient  estimators  are 
not  significantly  more  noisy  than  the  estimates  of  the  performance  measures 
themselves.  The  paper  also  discusses  implementation  issues  associated  with 
likelihood  ratio  gradient  estimation,  as  well  as  some  theoretical  complements 
associated  with  application  of  the  technique  to  continuous-time  Markov  chains. 
KEYWORDS:  highly  dependable  systems,  likelihood  ratios,  importance  sam¬ 
pling,  gradient  estimators.  ,  ,  ^  R/  , 
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1  Introduction 


This  paper  discusses  the  application  of  the  likelihood  ratio  gradient  estimator 
to  simulations  of  highly  dependable  systems.  We  believe  that  this  paper  makes 
the  following  contributions  to  the  existing  literature: 

1.  While  the  basis  or  the  likelihood  ratio  gradient  estimation  algorithm  has 
been  known  for  some  lime  (see,  for  example,  [f>],  |7],  |g],  (20],  [22],  and 
|2.t],  murh  less  is  known  about  the  empirical  behavior  of  the  estimator  in 
.practical  problem  settings.  In  this  paper,  we  show,  through  extensive  ex¬ 
perimentation  (see  Section  6),  that  the  likelihood  ratio  gradient  estimator 
is. an  effective  tool  for  measuring  parameter  sensitivity  in  the  context  of 
Markovian  models  of'highly  dependable  systems.  Both  steady-stale  and 
terminating  performance  measures  were  studied.  The  positive  results  that 
we  obtained  for  the  steady-stale  gradient  estimation  problem  arc  of  par¬ 
ticular  interest,  in  light  of  the  somewhat  pessimistic  conclusions  reached 
in  previous  theoretical  and  empirical  work  (see,  for  example,  [7],  (I!)],  and 
[20]).  Thus,  the  results  obtained  here. suggest  that  the  steady-stale  likeli¬ 
hood  ratio  gradient  estimator  can  be  quite  efficient  when  implemented  in 
an  appropriate  problem  setting. 

2.  The  paper  describes  one  of  the  few  successful  implementations  of  sophisti¬ 
cated  variance  reduction  techniques  within  a  widely  distributed  simulation 
software  package,  namely  the  System  Availability  Estimator  (SAVE)  (sec 
(1 1]  and  [12])  developed  with!:*  IBM.  The  •"vianc.c  reduction  techniques 
that  are  described  w*ilhin  this  paprr  have  been  implemented  so  as  to  lie 
invisible  to  the  user. 
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3.  Because  of  tlic  high  degree  of  dependability  of  the  systems  typically  simu¬ 
lated  by  SAVE,  rare  event  simulation  techniques  (specifically,  importance 
sampling)  are  used  extensively  within  the  package  (so  that  failures  ran  be 
observed).  This  paper  describes  how  to  combine  likelihood  ratio  gradient 
estimation  and  importance  sampling. 

This  paper  shows  how  “discrete-time  conversion”  can  be  applied  to  the 
steady-state  likelihood  ratio  gradient  estimator  (see  also  (5)  and  [20]).  This 
method  reduces  variance  by  removing  variability  due  to  the  exponential 
holding  lime  variates  associated  with  the  continuous-lime  Markov  chain 
that  is  being  simulated. 

5.  The  computational  burden  imposed  upon  SAVE  by  the  variance  reduc¬ 
tion  techniques  and  likelihood  ratio  gradient  estimator  ran  be  significant, 
for  example,  the  numerical  function  evaluations  required  to  compute  the 
analytically-derived  partial  derivatives  associated  with  the  gradient  esti¬ 
mator  arc  time-consuming.  Section  !»  describes  various  ideas  used  wit  bin 
SAVE  to  improve  the  computational  efficiency  of  the  estimator. 

6.  Certain  theoretical  loose-ends  concerned  with  the  likelihood  ratio  gradient 
estimation  technique  are  addressed  within  the  paper.  In  particular,  it  is 
shown  that  for  finite-stale  continuous-time  Markov  chains,  the  “amiabili¬ 
ty”  assumption  described  in  [20]  and  used  in  [5]  is  essentially  always  valid 
for  reasonable  performance  measures  (sec  the  Appendix  to  this  paper). 
Also,  it  is  shown  that  “discrete  time  conversion"  applied  to  our  gradient 
estimators  is  guaranteed  to  give  a  variance  reduction. 

This  paper  is  organised  as  follows.  Section  2  describes  the  basic  mathemati¬ 
cal  model  that  is  simulated  by  SAVE.  In  Sections  3  and  1,  respectively  likelihood 


2 


f 


e 


ratio  gradient  estimation  for  transient  and  steady-state  performance  measures 
is  discussed.  Scr.tion  -1.3  also  discusses  certain  insights  that  were  obtained  by 
analytically  analy7ing  the  behavior  of  the  likelihood  ratio  gradient  estimator  for 
a  couple  of  (very)  small  models.  In  Section  5,  implementation  issues  are  dis¬ 
cussed.  Section  G  is  devoted  to  a  description  and  discussion  of  the  experimental 
results  obtained  through  extensive  simulations  of  several  large  models  having 
more  than  a  million  slates.  Section  7  discusses  future,  research  directions.  The 
concluding  Appendix  contains  most  of  the  theoretical  material  alluded  to  in 
Item  G  above. 

2  Problem  Setting 

In  this  section  we  briefly  discuss  the  modeling  problems  being  addressed  by  the 
SAYF,  package  (l 0]  and  describe  the  basic  mathematical  model  being  simulated. 
Wc  also  describe  various  performance  measures  associated  with  the  models  wc 
consider  here. 

2.1  Modeling  Highly  Dependable  Systems 

SAVE  has  been  designed  to  construe!  and  solve  stochastic  models  of  fanll- 
tolcrant  computers.  Fault-tolerant  computing  has  been  applied  to  two  fun¬ 
damentally  different  classes  of  applications.  One  deals  with  mission  oriented 
systems  with  high  reliability  requirements,  such  as  space  computers,  avionics 
systems,  and  ballistic  missile  defense  computers  (sec  [•!]).  For  the  mission  to 
succeed,  the  system  must  not  fait  during  the  mission  time.  Hence,  the  prol>- 
ability  that  the  system  does  not  fail  during  the  mission  time,  i.e.  the  system 
reliability,  is  a  measure  of  interest.  Mean  time  to  system  failure  is  another  mea¬ 
sure  that  is  used  to  evaluate  such  systems.  The  other  class  of  applications  deals 
with  continuously  operating  systems  with  high  availability  requirements,  such 
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as  telephone  switching  systems,  general  purpose  computer  systems,  transaction 
processing  systems  (e.g.  airline  reservation  systems),  and  communication  net¬ 
work  computers.  For  such  systems,  system  failures  can  be  tolerated  if  they  occur 
infrequently  and  they  result  in  short  system  down  times.  For  such  systems,  the 
expected  fraction  of  lime  the  system  is  operational,  i.e.  the  system  availability, 
is  a  measure  of  interest. 

From  the.  modeling  point  of  view,  a  system  consists  of  a  finite  collection  of 
hardware,  and  software  components,  each  of  whirh  may  be  subject  to  failure,  re¬ 
covery,  and  repair.  Software  components  in  operation  can  also  be  modeled  with 
constant  failure  rates  (see  (l"j).  Component  interactions  often  have  a  substan¬ 
tial  effort  on  system  availability  and  must  therefore  lie  considered  in  addition  to 
the  individual  component  behaviors.  The  stale  spare  size  of  such  models  grows 
(often  exponentially)  with  the  number  of  components  being  modeled.  There¬ 
fore,  SAVE  provides  a  high  level  modeling  language  containing  constructs  whirh 
aid  in  representing  the  failure,  recovery  and  repair  behavior  of  components  in 
the  system  as  well  as  important  component  interactions. 

If  time  independent  failure  and  repair  Tates  are  assumed  then  a  .mite  state 
space,  time  homogeneous  continuous  time  Markov  chain  can  be  constructed  au¬ 
tomatically  from  the  modeling  constructs  used  to  describe  the  system.  Since 
the  size  of  Markov  chains  grows  exponentially  with  the  number  of  components 
modeled,  simulation  appears  to  be  a  practical  way  for  solving  models  of  large 
systems.  However,  the  standard  simulation  takes  very  long  simulation  runs  to 
estimate  availability  and  reliability  measures  because  the  system  failure  event  is 
a  rare  event.  Therefore,  variance  reduction  techniques  which  can  aid  in  comput¬ 
ing  rarc-evcnt  probabilities  quickly  arc  of  interest.  Specifically,  the  Importance 
Sampling  technique  has  been  found  to  be  most  useful  to  estimate  the  various 
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dependability  measures  (sec  (12}).  In  this  paper,  we  consider  the  gradient  esti¬ 
mation  problem  for  these  measures.  We  use  one  change  of  measure  to  compute 
the  gradient  using  the  likelihood  ratio  gradient  estimation  technique,  and  we  use 
another  change  of  measure  (importance  sampling)  to  compute  these  gradients 
quickly. 

2.2  Mnrkovian  Model 

Suppose  Y  =  {V,  :  .«  >  0}  is  an  irreducible,  continuous  time  Markov  chain 
with  slate  space  E  and  infinitesimal  generator  Q(0)  =  {ij(0,i,  j)  :  i,j  €  Tv } . 
where  0  is  in  some  open  set.  Q.  We  use  the  notation  that  P$  and  E$  represent 
the  probability  measure  and  expectation,  respectively,  induced  by  the  generator 
matrix  Q[0)  for  some  value  of  0.  We  assume  that  E  can  be  partitioned  into 
two  subsets:  E  =  0  U  F.  where  0  is  the  set  of  up  states,  i.e.  the  set  of  states 
for  which  the  system  is  operational,  and  F  is  the  set  of  down,  or  failed,  stales. 
We  assume  that  the  system  starts  out  in  the  stale  in  which  all  components  are 
operational;  wc  label  this  slate  as  state  0. 

Let  X  =  {A',  :  n  >  0}  be  the  sequenre  of  slates  visited  by  the  chain 

and  t„  be  the  time  spent  in  each  slate,  where  n  >  0.  Also,  we  define  X,  = 

(A'n, A't,..., A‘«).  Recall  X  is  a  discrete  time  Markov  chain  (l)TMC)  with 
transition  matrix  P(0)  defined  by  P(fl, «’,  j)  =  >l{0,i,j)/q(0,i)  for  i  j  and 
r(0,  i,  i)  =  0,  where  q(0,  i)  =  -q(0, »,  i).  Furthermore,  conditional  on  X,  the 
f«’s  are  independent  exponential  random  variables  for  w'hich  the  (conditional) 
mean  of  <«  it  l/q{0,  A'«). 

Define  (T,  :  n  >  0}  as  the  transition  times  of  Y,  i.e.  To  =  0,  and  Tn  = 

*o  +  U  +•••  +  f«~i  for  «  >  1.  Then  define.  N(l)  —  sup{n  >  0 :  Tu  <  l). 

Let  T  denote  a  slopping  time  satisfying  assumption  A5  in  the  appendix. 
Also,  for  any  set  of  states  A,  wc  let  a  a  denote  the  lime  the  CTMC  first  enters 
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the  set  As  i.c.  =  inf{.i  >  0  :  V,_  j?  A, Y,  €  .4}.  Of  particular  interest  are 
oo,  which  is  the  first  return  time  to  state  0,  and  or,  which  is  the  first  entrance 
time  into  the  subset  F  of  failed  states.  Our  goals  are  to  estimate  (1)  some 
performance  measure  r((?)  ss  EfZ(0),  where  Z(0)  is  some  (measurable)  function 
of  Y  and  (possihly)  0,  and  (2)  its  gradient  r'(0)  =  ^jr(ff).  By  varying  our  choice 
of  the  function  Z,  we  can  compute  many  different  performance  measures. 

2.5  Performance  Measures 

We  will  be  interested  in  two  types  of  dependability  measures  associated  with  the 
CTMC  Y:  transient  measures  and  so-called  steady-stale  measures.  Considering 
the  transient  measures  first,  the  interval  availability,  Aft),  is  defined  by 

^(0=7  /  ,|V.€0|rfV 

*  J *=rf» 

This  is  the  fraction  of  time  that  the  system  is  operational  in  the  time  interval 
(0, 0-  Wc  let 

/(0  =  F*M(01 

be  the  expected  interval  availability  and  let 

r(t,T)  =  r,{A(t)<T) 

denote  the  distribution  of  availability.  The  reliability  of  the  system  is  defined 
to  be  the  probability  that  the  system  does  not  fail  in  the  interval  (0,  (): 

K(0  =  ^ {a r  >  0  =  £*()(•,>!}]. 

For  steady-state  measures  wc  assnme  that  Y  is  irreducible,  in  which  case 
Y,  ^  Y  as  «  —  oo,  where  =*■  denotes  convergence  in  distribution  and  Y  is 
a  rv  having  ihc  str.idy-statc  distribution  r  —  {t;,  i  e  E)  (rr  solves  the  e- 
qualions  *Q  =  0).  Notice  that  steady-stale  measures  are  independent  of  the 
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starling  slate  of  the  system;  however,  we  will  choose  the  fully  operational  s- 


tate  (i.e.,  state  0)  to  define  a  regenerative  stale  for  the  system.  Also,  we 
assume  that  when  computing  steady-state  measures,  we  can  express  Z(0)  as 
Z(0)  =  lim(_00  }  l»)d«,  where  /(0,  •)  is  a  real- valued  function  on  E 

which  satisfies  assumptions  A7  and  A8  in  the  appendix.  By  regenerative  pro¬ 
cess  theory  (sec  (2]),  our  steady-state  measures  lake  the  form  of  a  ratio  of  two 
expected  values: 


r  =  E»\Z(0)\  = 


If  /(0,t)  =  then  E$[Z(0)\  is  the  long  run  fraction  of  time  the  system 

is  operational  and  is  called  the  steady-state  availability;  which  we  denote  by 
A  =  lim,_«.  7vf(/t(()}.  We  will  sometimes  find  it  convenient  to  consider  the 
expected  unavailability  U(l)  —  1  —  /(/)  =  1  -  fi»[/t(f)]  and  the  steady-state 
unavailability,  V  =  1  —  A.  The  problem  of  steady-state  estimation  thus  reduces 
to  one  of  estimating  the  ratio  of  two  expected  values. 

The  mean  time  to  failure  (MTTF),  /vi[ar].  is  typically  thought  of  as  a 
transient  measure,  since  it  depends  on  the  starling  stale  of  the  system  (state 
0),  which  is  assumed  to  be  the  fully  operational  state.  A  ratio  representation 
for  E#(of]  is  found  to  be  particularly  useful  and  is  given  by 

,  ,  E#(min(«F,«(,)] 

E,M " '  &*;<*}■ 


The  derivation  of  this  formula  is  given  in  [12].  Thus,  we  can  view  estimating 
E«(ar]  as  a  ratio  estimation  problem,  where  both  the  numerator  and  the  de¬ 
nominator  are  estimated  using  a  regenerative  simulation.  Therefore,  in  Section  4 
we  consider  the  estimation  or  the  mean  lime  to  failure  (MTTF)  together  with 
steady-slate  measures  which  are  also  (and  more  commonly)  estimated  using 
regenerative  simulations. 
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?>  Estimating  Transient  Performance  Measures 

Recall  that  our  goals  are  to  estimate  r(0)  =  E»Z(0)  and  its  derivative  r'(ff)  with 
respect  to  the  parameter  0.  In  the  case  of  transient  performance  measures,  we 


assume  that  our  function  Z(0)  has  one  of  the  two  following  forms: 

1.  Z(0)  —  I5,  where  S  is  some  (measurable)  set  of  events. 

2.  Z{0)  —  /(0,  V,)rfs,  where  T  is  some  stopping  time  satisfying  assump¬ 

tion  A5  given  in  the  appendix  and  / (0,-)  is  a  real-valued  function  on  E 
satisfying  assumptions  A7  and  A8  in  the  appendix. 

Wc  define  the  “likelihood"  of  a  sample  path  under  parameter  0  as 

r'v(r)  l 

MT'O)  =  \H  q(O.Xk)cxv[-q(O.Xk)tk)l\O..Xk,.Xk+])\ 

*•  *=n  -  ■* 

and  the  likelihood  ratio  is  given  by 

L(T,  0,  fl0)  =  d,i(T,  0)/M T,  On),  (3. 1 ) 

where  On  is  some  fixed  '■alue  of  0. 

Our  performance  measure  is  given  by 

r(0)  =  E,Z(0)  =  E,,Z(0)W\0s0n). 

We  call  this  transformation  a  “change  of  measure"  since  we  are  now  computing 
the  expectation  based  on  a  different  parameter  value.  The  validity  of  the  change 
of  measure  is  discussed  in  (lj  and  [12].  By  performing  the  change  of  measure, 
the  expectation  operator  is  now  independent  of  the  parameter  0. 

If  wc  formally  differentiate  this  expression,  assuming  that  we  can  interchange 
the  derivative  and  expectation  operators,  wc  have  that  by  applying  the  product 
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rule  of  differentiation 


t'(0)  =  Ee„Z'(0)L{l\0.0n)  +  Ei//(0)L'(T,  6, 0n), 

where  Z'{0)  =  0  if  Z{0)  has  form  1  above,  and  Z'(0)  =  fj  J'(0 ,  Y,)d*  ir  Z(0) 
has  form  2  above,  and 

■*»»l  I 

-  n\0-  A'n(7)4i)(7‘  -  rNm)  ]  //(7‘,  0,0n).  (3.2) 

The  proof  of  (he  validity  of  the  interchange  of  derivative  and  expectation  is 
given  in  Theorem  1  in  the  appendix. 

The  terms  simplify  when  we  evaluate  r'(fl)  at  the  point  0  =  0<„  In  this  case, 
we  have  that  since  L(T.0u.0n)  =1. 

r'(0„)  =  E,.7/(0„)  +  Et,Z(0«)f/(T,  00,  On)  (3.3) 


I 'IT  0  0  \  V'NIT)  »/«  ..  «.  I  Aj 

b[T,(ln,0n)  =  E.„,  +  7Tfcv,;.V, 


—  A'W(T)+i  )(r  •*  7w(T))- 


*+i)J 

(3.4) 


Note  that  if  7’  is  either  the  time  of  the  first  transition  after  a  deterministic  time 
t  or  a  hilling  time  to  a  set  F,  the  last  exponential  term  drops  out. 

The  stopping  time,  say  7j,  used  in  the  likelihood  ratio  nerd  not  be  the 
same  as  the  stopping  time,  say  7j,  used  for  the  function  Z(0o).  Ho.  er,  we 
always  need  to  take  T\  >  7  j,  with  strict  inequality  possible.  For  example,  when 
computing  reliability  at  a  time  <,  note  that  Z(0n)  is  F«*measurable,  while  we  can 
use  the  likelihood  ratio  based  on  Fy,.,.,,,,  where  T\  is  the  sigma-freld  generated 
by  the  process  up  to  time  f. 


These  results  arc  similar  to  the  results  derived  in  In  [20],  only  the  special* 
ization  to  Poisson  processes  is  discussed.  Also,  the  "amiability”  assumption 
discussed  in  [5]  and  [20]  holds  in  the  current  context,  and  examples  of  perfor¬ 
mance  measures  satisfying  this  condition  are  discussed  in  Section  2.3  and  in  the 
Appendix. 

3.1  Importance  Sampling  to  Reduce  Variance 

We  can  now  apply  another  change  of  measure  to  implement  the  importance 
sampling  to  obtain 

r'(0o)  =  Er7/(0„)HT,0n,0*)  +  /v,.Z(0u)f/(7\0„,0n)M7’,0,,,0*),  (3.5) 


where  0*  is  the  parameter  value  used  for  the  importance  sampling  change  of 
measure.  /,(7*,0n,0 *)  is  the  same  as  Equation  3.1  except  with  (0o,0*)  playing 
the  role  of  (0,  0P)  (as  it  should  be).  Note  that  we  can  use  two  different  changes 
of  measure,  i.c.  two  different  values  of  0*,  to  estimate  the  two  expectations 
on  the  right  hand  side  of  Equation  3.5.  Also,  the  value  of  0*  used  after  the 
k **  transition  can  depend  on  X*,  the  entire  sequence  of  slates  visited  up  to 
that  point,  and  also  on  7*,  the  time  or  the  ka  transition.  We  call  This  method 
"dynamic  importance  sampling"  (D1S).  These  ideas  are  discussed  in  Section  5 
and  also  in  (10). 

We  can  actually  separate  the  likelihood  ratio  into  two  different  components, 
the  first  including  only  the  transition  probabilities  of  the  embedded  DTMC  and 
the  second  incorporating  only  the  random  holding  times,  i.c. 


L(7*,#o,0*  ,0**)  =  M7-,0o,0*)Mr,0P,0"), 


where 


lx  (T,  00.0*) 


t?  r(80,xk,xk») 

P(0*,A'*,A’4+i) 


(3.6) 
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(3.7) 


h(T,W) 


^  <?(<?», A‘t)cxp{-q(go,A\)U} 
*=n  <*(#**.  A't) exp{-9(0**, A'*)<*} 

txy{-q{8n, X nit)+\)(T  -  Tn(t))} 
cxp{-7(fl**,  XN{rHi)(T  -  Tn(t))) 


Thus,  we  can  apply  different  changes  of  measure  to  the  two  components  of  the 
likelihood  ratio,  which  allows  ns  to  tailor  each  change  of  measure  for  a  specific 
purpose. 

Lewis  and  Rohm  (IS)  presented  an  importance  sampling  technique  for  es¬ 
timating  transient  measures.  They  apply  "failure  biasing"  to  the  embedded 
DTMC;  this  causes  failures  to  occur  with  higher  probability  and  therefore  quick¬ 
ly  moves  (biases)  the  DTMC  towards  the  set  of  failed  states.  This  change  of 
measure  is  incorporated  in  the  first  component  <’>f  the  likelihood  ratio  /<j.  They 
also  apply  "forced  transitions"  to  the  holding  time  in  state  0  (the  stale  with  all 
components  operational)  to  the  estimation  of  reliability.  This  forces  the  next 
component  failure  to  occur  before  time  /.  Specifically,  if  A’«  =0  and  Tn  <  f, 
then  the  next  holding  time,  („+)  is  forced  to  be  between  zero  and  t  -  7,  by 
selecting  /,+i  from  the  conditional  density  given  by 


Aoc-*"1’41 

J  —  —  In)  * 


where  0  <  f„+1  <  l  -  7,  and  An  is  the  total  failure  rale  in  slate  0.  This  change 
of  measure  is  incorporated  into  the  second  part  of  the  likelihood  ratio,  7»j.  The 
simulation  continues  until  time  T  =  min(as-, N(t)  +  1). 

Note  that  h(fa+t|A'll,<li)  is  not  positive  whenever  the  exponential  density 
is,  and  so  the  standard  theory  of  importance  sampling  says  that  this  is  not  a 
legitimate  change  of  measure.  However,  in  this  case  h(fa+i  |A’»,  /„)  is  positive 
over  that  part  of  the  sample  space  ({w :  nF  <  «))  that  counts,  which  is  sufficient 
(see  (9)). 


II 
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lively,  and  <t\  -  Var[Zj-,j(0)  -  r7)]/E|7}]J.  See  |2]  for  further  details. 

If  one  formally  differentiates  the  expression  for  t(0)  by  interchanging  the 
derivative  and  expectation,  one  obtains 


where 


(‘<•1) 


=  r^zT(on) 

=  p:»MOu)  +  EM^)i'(T.(i^on) 


=  E,„7L'(7AA) 


^.((?n)=  53 /'(<?„,  A\)f* 
/.'(7-A.fln)  =  jr  + 


'Ai,  At,  .1 J+I , 


~[«7(0n,At)  "("<■.  At,  At+i)  J 

The  proof  of  the  validity  of  the  interchange  of  the  operators  is  given  in  the 
appendix. 

In  order  to  construct  confidence  intervals  for  our  estimate  of  r'(0),  we  need 
an  expression  for  its  asymptotic  variance.  This  is  given  by 

fit* a  +  +  Jrfr'c  + 

-  •jb'TAC  -  HrvAD  (4.2) 

_2'(  -Kat(ffnc  _  Js-(r^Si(rBD  +  *f<rCp|, 

where  <r,v  =  K«r(A'),  «r.w  =  C0l'(A\V), 

A  =  Z'T(0a)  +  ZT(0o)L'(rt0oA->)  (4.3) 
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T 

(4,1) 

7,r{0  n) 

(4.5) 

77/(7*.  0,„  0„), 

(1.6) 

and 


a  =  E#„  /I 
fl  =  Ki.B 
1  =  Fi,(' 
f  =  £*„/>. 


A  proof  of  (hr  validity  of  the  expression  for  (lie  vnrianee  is  given  in  [20).  However, 
we  give  a  simpler  and  cleaner  proof  in  the  appendix. 


4.1  Conditioning  to  Reduce  Variance 


Conditional  Monte  Carlo  is  a  technique  which  can  be  used  to  reduce  the  variance 
in  simulations  of  CTMCs  (Sec  [3]  and  [13]).  By  conditioning  on  the  embedded 
DTMC  X,  wc  arrive  at  what  is  known  as  the  discrete  lime  method,  in  which  the 
holding  times,  («,  are  replaced  by  their  (conditional)  means,  )/q[0,  A’,).  There 
arc  two  advantages  of  using  this  approach.  First,  since  we  replace  the  random 
holding  times  I,  with  their  (conditional)  means,  wc  do  not  have  to  generate 
exponential  variates.  Also,  as  discussed  in  [3]  and  [13],  this  transformation  is 
guaranteed  to  give  a  reduction  in  the  variance  of  the  estimate  of  r(0).  Wc 
also  show  that  the  transformation  is  guaranteed  to  redure  the  variance  of  the 
estimate  of  t'(0). 

Using  conditional  Monte  Carlo,  we  obtain  another  ratio  formula 


r(<>)  = 


E,E,[Z  tWIX]  Et\G(6)] 

E,E,\T\X\  "  £■#[//(#)]  ’ 


(4.7) 


M 


where  a  straight  forward  calculation  shows  that 


r»-1 

«(*)  = 

4=n 

(4.8) 

Tf»-l 

L_n 

(4.9) 

a— t’ 

9(0.  i)  =  J(0.i)/q(0,i).  i  €  E 

l>(0,  »)  =  1  /<?((?, «),  «  €  E 

and  r„  is  (lie  first  return  time  of  the  DTMC  to  slate  0. 

In  (6),  it  is  shown  that  under  certain  conditions  (viz.,  assumption  A  t  given 
in  the  appendix)  that 

r((l)  = 

1  ^  EfJI(0)L(r(„0,0(ty 

where  L(t(\i  0, 0n )  is  the  DTMC  likelihood  ratio,  which  is  defined  as 

r*-l 


7.(r0, <?,(?")  =  H  P(On'xi‘xw)‘ 


(4.10) 


A  simple  calculation  shows  that  /„(rn,0.flo)  =  E»I^(TT,,<?,fln)|X].  Note  that  1. 
is  the  same,  as  L\  defined  in  Equation  3.6.  If  one  formally  differentiates  (his 
expression  by  interchanging  the  derivative  and  expectation,  one  obtains 


(*»)■ 

where 


(4.11) 


«((?„)  =  F^GiO,) 

u'(t?n)  =  Et,G'(0„)  +  £|,G(flfl)/;(r0,fln,flo) 

f(0o)  =  E,  .//(ft,) 

i'{Oo)  —  E$,  II'(6q)  +  E$,  H  (t>o)  //(to,  t?o,  do) 
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C»(f„)  -  V  JI{0lU  Xkhi°0' Xk)  - q,[K  Xi)I{0("  Xk)  (4.12) 
°  9*(0n,A*) 


)  _  y*'  Wo.^) 


/./  *  a  \  _  Ty  ^o.-’Vt.A'^O 


(4.13) 


(4.14) 


The  proof  of  the  validity  of  (lie  interchange  of  the  operators  in  this  ease  is  given 
in  (G). 

Let  rrf  and  <r§  he  the  variances  of  the.  gradient  estimators  when  using  the 
ratio  formula  obtained  without  and  with  conditional  Monte  Carlo,  respectively, 
i.e.  <Tp  which  is  given  in  Equation  4.2.  is  the  asymptotic  variance  of  the  esti¬ 
mator  of  Equation  4.1,  and  <rj  is  the  asymptotic  variance  of  the  estimator  of 
Equation  1.11.  Then,  we  have  that  «r\  <  rrj,  which  stales  that  when  using  the 
ratio  formula  conditional  Monte  Carlo  always  givrs  rise  to  a  lower  asymptotic 
variance  constant  (see  Proposition  1  in  the  appendix). 


4.2  Importance  Sampling  to  Reduce  Variance 


As  in  Section  3.1,  we  can  use  importance  sampling  by  applying  another  change 
of  measure.  However,  in  this  case  since,  we  use  conditional  Monte  Carlo  to 
condition  out  the  holding  times  in  each  state  whrn  estimating  steady-state  per¬ 
formance  measures,  the  likelihood  ratio  only  consists  of  its  first  component  L\ , 
given  in  Equation  3,6,  or  equivalently,  A,  given  in  Equation  4.10. 


4.3  Two  Simple  Examples 

In  this  section,  we  consider  two  simple  availability  examples.  The  first  is  a  one- 
dimensional  birth  and  death  process  with  three  states,  which  was  also  analyzed 
in  (12],  and  the  second  is  a  two-dimensional  five  slate  birth  and  death  process. 
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Bcrausr  of  their  simple  structure,  we  are  able  to  tin  an  extensive  amount  of  anal¬ 
ysis  on  these  models.  Herall  that  wc  defined  the  sensitivity  of  a  performance 
measure  with  respect  to  a  certain  parameter  to  be  the  product  of  parameter 
itself  multiplied  hv  the  gradient  of  the  performance  measure  with  rcsprcl  to  the 
parameter.  Therefore,  a  sensitivity  measures  the  effect  of  relative  changes  to 
parameter  salues  on  a  performance  measure,  and  so  relative  changes  in  the  pa¬ 
rameters  corresponding  tn  the  largest  sensitivities  will  cause  the  largest  change 
in  lh»*  performance  measure.  We  will  show  that  when  one  sensitivity  is  much 
larger  in  magnitude  larger  than  another,  its  relative  accuracy  is  much  greater 
than  that  of  the  smaller  sensitivity.  In  addition,  we  can  estimate  the  sensitivi¬ 
ties  with  the  largest  orders  of  magnitude  with  about  the  same  relative  accuracy 
as  the  regular  estimate,  as  long  as  each  sample  (e.g.  a  regenerative  cycle  in  the 
case  of  steady-state  estimation)  consists  of  only  a  few  transitions.  This  is  true 
in  the  highly  reliable  component  situation  which  we.  consider  in  this  paper.  Wc 
measure  the  relative  accuracy  by  the  squared  coefficient  of  variation.  Much  of 
the  analysis  was  done  using  the  symbolic  manipulator  Scratchpad  [25]. 

\Yrc  define  the  vector  of  parameters  0  which  we  compute  sensitivities  with 
respect  to  as  the  vector  of  all  continuous- valued  parameters  of  the  model.  Note 
that  the  above  claims  depend  on  the  parameterization  of  the  model.  However,  in 
the  reliability  context  which  wc  are  considering  in  this  paper,  there  is  a  natural 
parameterization  of  the  model,  which  is  to  have  0  consist  of  the  values  of  all  of 
the  component  failure  rales  and  repair  rates.  With  0  defined  in  this  manner, 
the  claims  seem  to  hold. 
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4<S.i  A  Three  State  Example 


The  three  state  example  can  be  viewed  as  a  reliability  system  in  which  there  is 
one  type  of  component  with  redundancy  of  two  and  the  components  can  fail  and 
be  repaired.  The  components  have  failure  rate  A  and  repair  rate  fi.  The  slate 
spare  is  E  =  {0. 1, 2}.  We  assume  that  births  correspond  to  failures  and  deaths 
correspond  to  repairs  so  that  state  i  corresponds  to  having  i  failed  components. 
We  consider  the  system  to  be  operational  in  states  0  and  1  but  failed  in  stale  2. 

The  transition  matrix  P  of  the  embedded  DTMC  has  the  following  non-zero 
entries:  r( 0,  t)  =  /’(2. 1)  =  I,  /’(1 ,2)  =  A/(A  +  /i),  and  /’(l,0)  =  #i/( A  +  /»). 
Using  the  method  of  conditional  Monte  Carlo,  we  let  /»;  be  the  mean  holding 
time  in  stale  ».  Thus,  hn  =  1/(2A);  h\  =  l/(A  + /i),  and  hj  =  1//«.  Sinre  we  are 
working  with  highly  reliable  systems,  we  assume  that  A  <  /i.  We  can  further 
assume  that  //  =  I,  since  this  only  fixes  the  time  scale. 

We  arc  interested  in  the  steady-state  unavailability  r,  which  is  the  steady- 
state  probability  of  being  in  the  failed  slate  2.  Recall  that  we  ran  estimate 
this  quantity  using  the  regenerative  method  and  can  express  r  as  the  ratio 
E[G\/E[II]t  as  in  Equation  *1.7.  In  this  example,  we  set  /(0)  =  /(I)  =  0  and 
/( 2)  =  1.  Wc  assume  that  slate  0  is  the  regenerative  slate.  The  numerator  in 
our  rath*  formula  can  be  explicitly  written  as  G  —  rtph^,  and  the  denominator 
can  be  expressed  as  II  =  hn  +  hj  +  np(h j  +  hj).  where  tip  is  the  number  of 
times  the  failure  stale  is  reached  in  the  regenerative  cycle.  Note  that  tip  has  a 
geometric  distribution,  so  that  Ejnjr]  =  A ///  and  Varjn /r]  =  A(A  +  p)//is.  Thus, 
Zv[G]  =  hjX/fi  and  E\H]  =  (/in  +  ^i)  +  (/i|  +  AjJA/p.  As  shown  in  [12],  we  have 
that  r  =  0(A?)  and 


cr!(f,m) 


Var{C  -  r //) 
mr7(EII)i 
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where  CY*{r,m)  denotes  the  asymptotic  squared  coefficient  of  variation  or  our 
estimate  of  r  after  m  regenerative  cycles  (we  modify  Knuth's  ©-notation  [1 4) 
to  mean  f(x)  s  Q(tj{x))  if  there  exist  constants  C \  and  such  that  for  all  x 
sufficiently  sinalh  0  <  C\g(x)  <  f(r)  <  Cjg(r)  ). 

Straightforward  calculations  show  that  the  sensitivities  t\  =  ©(A7)  and 
r„  r:  ©(A7),  where  we  use  the  notation  r#  =  0  ■  dr/dO.  Using  the  asymp¬ 
totic  variance  from  the  Central  Limit  Theorem  for  gradient  estimators  from 
Section  -I.  we  can  arrive  at  the  asymptotic  squared  coefficients  of  variation  of 
our  sensitivity  estimates,  which  are  given  by  0’7(rA, m)  =  0(l/A)/m  and 
(71’7(r,,.m)  =  ©(l/A )/tn.  All  of  the  variance  and  covariance  terms  in  the 
expression  for  the  asymptotic  variance  of  the  gradients  were  used  in  the  calcu¬ 
lations  in  this  example.  It  turns  out  that  the  dominant  terms  in  the  expression 
for  the  variance  of  the  gradients  are  terms  involving  the  variances  of  the  down 
lime  in  a  cycle  G  and  its  gradients. 

Thus,  when  A  <  /i,  we  have  that  both  of  the  sensitivities  are  of  the  same 
order  as  the.  regular  estimate,  and  the  relative  accuracies  of  the  regular  estimate 
and  the  sensitivities  are  about  the.  same. 

4.S.2  A  Five  State  Example 

The  five  state  example  models  a  system  with  two  types  of  components,  each 
of  which  has  redundancy  of  two.  There  is  also  the  added  restriction  that  once 
a  component  of  one  type  fails,  the  components  of  the  other  type  cannot  fail 
until  the  state  with  all  components  operational  is  reached.  Thus  the  state  space 
of  this  example  is  E  -  {(0,0),(1,0),(2,0),(0,  J ),  (0, 2)},  where  in  slate  («,ji), 
i  represents  the  number  of  failed  components  of  type  1,  and  j  is  the  number 
of  failed  components  of  type  2.  We  assume  that  the  regenerative  slate  is  the 


19 


state  in  which  all  components  are  operational.  We  consider  the  system  to  be 
operational  in  states  (0,0),  (1,0),  and  (0, 1),  but  failed  in  states  (2,0)  and  (0,2). 
We  let  A;  denote  the  failure  rate  of  component  type  i,  and  let  /<  denote  the  repair 
rale  of  both  types  of  components.  We  assume  that  A2  A)  <  p  =  1. 

The  transition  matrix  of  the  embedded  DTMC  has  the  following  non-rcro 
entries: 


r((n,o).(i.o))  = 
r(( !,»),  (2,0))  = 
/*((«,  o)r(o,o))  = 
r(M,(o,  j))  = 
/•((o.  i),(o,  2))  = 
r«o.i),(o,o))  = 
r((2,o),(i,o))  = 


A,/(A|  4  A2) 

*i/(*i  4  //) 

/i/(Aj  4  /*) 

A?/(A(  4  Aj) 

A?/(A?  4  11) 

/‘/(A?  4  ft) 

/’(( 0,2),(0,1))=1. 


W,c  let  h(ij)  denote  the  mean  holding  lime  in  slate  (»,  j).  Thus,  /i(o,o)  =  1/(2A|  4 
2A?),  =  l/(Aj  4  /1),  =  l/(Aj  4|i),  and  A(j,n)  —  f/( n,s)  =  1/i'. 

In  this  example,  we  set  /( 0,0)  s=  /(J,0)  =  /(0. 1)  =  Oand  /(2,0)  =  /(0,2)  = 
1.  Using  the  ratio  formula  again  to  estimate  (he  steady-state  unavailability,  we 
have  that  the  numerator  catt  be  explicitly  written  as 


and  the  denominator  can  be  expressed)** 


W  =  V.")  +  *f.Viw(J,n)j(A(j1n)  +  ni(A(j,o)  4*(},o))) 

+  *f-Vi*(o.l)l(fc(o,i>  +  n»(V,t)  +  Vs)))* 

where  A*  $  is  the  first  state  visited  by  the  embedded  DTMC,  nj  is  the  number 
of  times  state  (2,0)  is  visited  in  the  regenerative  cycle,  and  nj  is  the  number 
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of  limns  stain  (0,2)  is  visited  in  the  regenerative  ryclc.  Note  that  conditional 
on  A' i  =  (J.0),  r>i  has  a  geometric  distribution,  and  conditional  on  A'|  =  (0,1), 
n2  has  a  geometric  distribution.  Thus  /?(n 1 1 A* j  =  (1,0))  =  X\/fh  Var[ii||A'i  = 
(1,0))  =  A|(Aj  +/<)//<*.  /v!n2|A'i  =  (0,1))  =  W/‘.  »nd  Var[n2|A'i  =  (0,1))  = 
A2(A2  +  //)//i?.  Thus, 


E[G)  = 


A?  +  A* 
(A|  +  A2)/ij 


.....  2(X]  -f  A%  4  (Ai  4  A 2)/t)  4  it2 

11  =  2(A,  4  A2)//7 

Therefore,  assuming  that.  A2  <  A(  <  ;»  =  1,  we  have  that  r  =  Q(A^)  and 

.  ie(f ). 

mr*\hUy  m  J 

Straightforward  calculations  show  that  the  sensitivities  r»,  =  ©(A7)  and 
rX}  =  ©(A?  4  A2).  Using  the  asymptotic  variance  from  the  Central  Limit  Theo¬ 
rem  for  gradient  estimators,  we  can  arrive  at  the  asymptotic  squared  coefficient 
of  variation  of  our  sensitivity  estimates,  which  ate  given  by 


CV’K.m)  = -U>(^) 


*11 IV* 

CT7(rA„r.i)  =  +  +a?A7)‘ 

We  only  used  the  terms  involving  the  variances  of  the  down  time  in  a  cycle  and 
its  gradient  in  these  calculations  since,  when  A2  <  Aj  <  ji  =  1,  these  turn  out 
to  be  the  dominant  terms  as  they  were  in  the  three-state  example. 

From  this  example,  we  see  that  when  A2  <  Ai  <  /j  =  ],  the  sensitivity 
with  respect  to  A]  is  much  larger  in  magnitude  than  the  sensitivity  with  respect 
to  A2,  and  the  relative  accuracy  of  the  former  is  much  better  than  that  of  the 
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Thus,  we  see  that  in  these  two  examples  the  relative  accuracy  of  the  sensi¬ 
tivity  with  the  larger  magnitude  is  of  the  same  order  of  the  relative  accuracy  of 
the  regular  estimate.  Though  these  results  were  derived  for  simple  examples,  we 
can  sec  that  this  is  also  true  for  the  models  used  in  experimentation  in  Section  6 

5  Implementation  Issues 

In- this  section  we  consider  the  implementation  of  the  different  variance  reduc¬ 
tion  techniques  described  in  the  previous  sections.  These  techniques  have  been 
implemented  in  the  SAVE  package  so  that  large  models  can  he  simulated.  One 
salient  feature  of  our  implementation  is  that  we  use  one  simulation  run  for  esti¬ 
mating  all  the  measures  and  sensitivities.  Regenerative  simulation  is  used  with 
the  "all  components  operational"  state  as  the  icgenrrative  state.  The  event 
generator  simulates  only  the  embedded  Markov  Chain.  For  the  steady-state 
measures  we  accumulate  functions  of  the  mean  holding  times  in  the  various 
states  and  also  functions  of  the  gradients  of  menn  holding  times  and  transi¬ 
tion  probabilities.  For  the  transient  measures  we  accumulate  functions  of  the 
sample  holding  times  (from  exponential  distributions)  in  the  various  states  and 
also  gradients  of  the  transition  probabilities  and  densities  of  the  holding  limes. 
The  likelihood  ratio  for  transient  measures  is  different  for  each  of  the  different 
transient  estimators  allowing  ns  to  tailor  it  for  the  specific  application. 

We  have  employed  various  techniques  in  the  implementation  of  the  SAVE 
package  which  allow  the  CTMC  to  be  generated  quickly.  We  do  not  generate 
the  entire  state  space  explicitly  since  the  models  which  we  solve  using  SAVE  can 
have  millions  of  states.  Instead,  we  use  arrays  to  keep  track  of  the  number  of 
operational  and  spare  components  of  each  type  and  the  order  of  components  in 
the  repair  queue.  This  information  is  sufficient  to  determine  the  state  that  the 
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syslein  is  in.  In  addition,  arrays  arc  used  to  store  the  total  failure  rate  and  total 
repair  rate  of  each  type  of  component.  Also,  we  keep  track  of  the  sum  of  all 
of  the  failure  rates  of  all  operational  and  spare  components  and  the  sum  of  all 
the  repair  rates  of  components  currently  being  repaired.  Since  each  transition 
typically  alTccIs  only  one  component  (the  program  has  been  implemented  to 
allow  for  the  possibility  of  the  statuses  of  more  than  one  component  changing 
on  a  single  transition,  e.g.  one  component  failuic  causing  others  to  fail  also), 
we  are  able  to  update  the  arrays  and  the  two  sums  quickly.  By  having  these 
quantities  readily  available,  we  ran  easily  determine  the  transition  probabilities 
of  the  embedded  DTMC  and  also  the  total  rale  out  of  a  state. 

The  importance  sampling  for  the  embedded.  Markov  chain  is  based  on  the 
following  heuristics.  As  suggested  in  (12),  we  need  to  move,  the  system  very 
quickly  to  the  set  of  failed  states  F,  and  once  F  is  entered,  the  importance  sam¬ 
pling  should  be  turned  oIT  so  that  the  system  quickly  returns  to  state  0,  the  “all 
components  operational"  state.  We  achieve  this  by  increasing  the  probability  of 
failure  transitions  over  repair  transitions.  This  has  been  called  "failure  biasing" 
in  (15).  We  assign  a  combined  probability  Haul  to  thr  failure  transitions  in 
all  the  slates  where  both  failure  and  repair  transitions  are  feasible.  Individual 
failure  and  repair  transitions  are  selected  in  the  ratio  of  their  rates  given  that  a 
failure  or  a  repair  is  sei-  .ted,  respectively.  We  call  this  the  Biatl/Raiio  method, 
or  simply  the  Bia»t  method.  We  have  found  two  other  methods  useful  for  se¬ 
lecting  individual  failure  transitions,  given  that  a  failure  has  occurred.  The  first 
is  to  use  a  uniform  distribution  on  the  failure  transitions,  which  has  very  good 
performance  for  “unbalanced"  systems,  as  shown  in  Section  6  and  in  (12).  We 
call  this  the  Biatl  /Balancing  method.  The  second  is  to  give  higher  combined 
probability,  hiatl ,  to  those  failure  transitions  which  correspond  to  componen- 
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I  types  which  have  at  least  one  component  of  their  type  already  failed.  This 
exhausts  the- redundancy  quickly  and  has  much  better  performance  for  “bal¬ 
anced"  systems,  as  shown  in  Section  6  and  in  (12].  We  call  this  the  Bia*l/Bia*2 
method. 

For  the  steady-stale  unavailability  each  regenerative  cycle  corresponds  to  a 
sample.  We  use  cither  direct  simulation  or  the  D1S  method  given  in  Section  4.2 
to  estimate  steady-state  unavailability  and  its  sensitivities.  For  the  mean  time 
to  failure,  a  sample  ends  when  cither  the  regeneration  occurs  or  the  system 
enters  o;ic  of  the  system  failed  stales  from  the  set  F.  In  the  latter  case,  we 
continue  to  simulate  the  embedded  Markov  chain  until  the  regeneration  occurs 
before  starting  a  new  sample.  This  wastes  only  a  few  events  as  typically  a 
regenerative  cycle  is  short  (the  number  of  events  per  cycle  is  approximately 
twice  the  average  redundancy,  which  is  typically  two  or  three).  Once  again,  wc 
use  cither  direct  simulation  or  the  DI5  method  to  estimate  the  mean  time  to 
failure  and  its  sensitivities.  For  the  transient  measures  multiple  regenerative 
cycles  may  be  contained  in  a  single  sample.  Moreover,  a  sample  typically  ends 
cither  when  a  failure  occurs  or  when  the  time  interval  expires,  which  is  usually 
in  the.  middle  of  some  regenerative  cycle.  As  in  the  case  for  the  mean  lime 
to  failure,  we  continue  to  simulate  the  embedded  Markov  chain  until  the  next 
regeneration  occurs  before  starting  a  new  sample.  Separate  accumulators  for 
the  appropriate  likelihood  ratios  and  their  derivatives  are  maintained  for  each 
transient  estimator,  time  horizon  of  interest,  and  parameter  a  sensitivity  is 
computed  with  respect  to.  Thus,  all  measures  can  be  estimated  simultaneously 
from  a  single  simulation  run. 

In  the  SAVE  package,  the  user  is  able  to  compute  sensitivities  or  all  of 
the  performance  measures  with  respect  to  any  continuous  valued  parameter  of 
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the  model,  where  a  sensitivity  of  a  performanee  measure  with  respect  to  some 
parameter  is  defined  as  the  product  of  the  parameter  itself  multiplied  by  the 
gradient  of  the  performance  measure  with  respect  to  the  parameter.  In  its  most 
general  form,  SAVE  employs  a  symbolic  differentiator  to  compute  the  derivatives 
needed  during  the  simulation.  This  allows  the  use  of  complicated  expressions 
to  describe  the  parameters  of  the  system.  For  example,  one  could  specify  that  a 
failure  rate  of  some  component  to  be  5A|  +  4AjA.i,  and  then  compute  the  sensi¬ 
tivity  of  some  performance  measure  with  respect  to  the  parameter  Aj.  However, 
the  computation  of  sensitivities  is  somewhat  slow  using  this  technique,  with  the 
extra  CPU  lime  needed  to  compute  each  sensitivity  being  about  the  same  as 
the  time  needed  to  compute  the  regular  (non-gradient)  performance  measure. 
Therefore,  we  have  employed  special  techniques  in  the  implementation  of  the 
SAVE  parkage  to  allow  the  user  to  compute  sensitivities  with  respect  to  certain 
parameters  with  little  extra  computational  efTorl.  If  the  user  desires  to  compute 
sensitivities  with  respect  to  only  component  failure  rates  and  repair  rates  and 
these  rales  are  not  themselves  functions  of  other  parameters,  then  the  addition¬ 
al  CPU  time  needed  is  small,  as  shown  in  Section  6.2.3.  Note  that  when  in 
expressions  for  the  gradient  of  a  performance  measure  given  Equations  3.3  and 
3.4,  the  only  derivative  terms  which  we  have  to  compute  are  Z'(0)  and  q'(0,  x,  y) 
(since  both  q(0,x)  and  P(0,x,z)  can  be  expressed  solely  in  terms  of  q(0,  x,  y)). 
For  the  transient  performance  measures  we  consider  in  the  SAVE  package,  we 
have  Z'(0)  =  0.  In  the  case  of  steady-state  performance  measures  computed 
in  SAVE,  we  have  that  the  role  that  Z(0)  played  in  the  transient  measure  case 
has  been  replaced  by  G(0)  and  11(0)  since  we  use  conditional  Monte  Carlo  (sec 
Section  4.1),  which  depend  on  the  parameter  0  only  through  q(0,x,y).  Now 
note  that  q(0,x,y)  is  an  integer  multiple,  say  k,  of  either  a  component  failure 
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rate  or  repair  rate  (where  k  is  either  (he  redundancy  of  (he  r  ponenl  or  the 
number  of  busy  repairmen).  So  by  only  computing  sensitivities  with  respect  to 
simple  failure  and  repair  rales,  the  derivative  of  q(0,x,y)  is  cither  0  or  k,  thus 
allowing  us  to  bypass  the  symbolic  differentiator. 

6  Experimental  Results 

In  this  section,  \vc  will  discuss  the  results  of  simulations  of  two  different  models 
in  order  to  analvrc  the  behavior  of  gradient  estimates  via  the  likelihood  ratios 
method  and  to  demonstrate  the  effectiveness  of  different  variance  reduction 
techniques.  We  compare  the  sensitivities  to  the  regular  estimates  in  many  cases 
in  order  to  benchmark  our  results,  where  we  define  the  sensitivity  of  a  measure 
with  respect  to  a  parameter  to  be  the  product  of  the  gradient  of  the  measure 
with  respect  to  the  parameter  multiplied  by  the  value  of  the  parameter  itself. 
All  numerical  (non-simulation)  and  simulation  results  were  obtained  using  the 
SAVF/  package  ((1 1]). 

6.1  An  n-component  parallel  system 

The  purpose  of  the  first  set  of  experiments  is  to  examine  the  effect  of  the  length 
of  the  regenerative  cycle  on  the  variability  of  likelihood  ratio  gradient  estima¬ 
tors  of  stendy-statc  performance  mcnsurcs.  We  will  discuss  the  results  of  some 
simulations  of  an  n-component  parallel  system,  where  we  varied  the  number  of 
components  n  from  2  to  12.  In  order  for  the  system  to  br  operational,  there 
must  be  at  least  one  functioning  component.  7  he  repair  rate  /<  was  fixed  at 
1.0  for  all  values  of  n,  and  the  values  of  the  failure  rate  A  were  varied  so  that 
the  actual  value  of  the  steady-stale  unavailability  remained  fixed  at  0.001.  For 
each  value  of  n,  we  simulated  for  1, 02-1 ,000  events  and  formed  estimates  of  the 
steady-steady  unavailability  and  the  sensitivities  of  it  with  respect  to  A.  Table 
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1  contains  the  values  of  the  numerical  results  of  steady-state  unavailability  and 
its  sensitivity  with  respect  to  the  failure,  rate  A,  and  also  their  respective  point 
estimates  and  percentage  relative  half-widths  of  the  90%  confidence  intervals 
obtained  using  direct  simulation  for  each  of  the  experiments.  The  percentage 
relative  half-width  of  a  confidence  interval  is  defined  to  be  100%  times  the  con¬ 
fidence  interval  half- width  divided  by  the  point  estimate. 

It  is  interesting  to  note  that  for  small  values  of  n,  the  relative  size  of  the. 
confidence  intervals  of  the  sensitivities  are  close  to  the  relative  size  of  the.  con¬ 
fidence  intervals  of  the  estimates  of  the  steady-state  unavailability.  However, 
as  the  number  of  components  in  the  system  increases,  the  relative  accuracy  of 
the  sensitivity  estimates  degrades.  The.  reason  for  this  is  that  the.  number  of 
events  per  regenerative  cycle  is  increasing  as  the  number  of  components  in  the 
system  grows  since  we  have  adjusted  the  failure  rate  in  order  that  the  value 
of  the  steady-state  unavailability  remains  constant.  Since  the  gradient  of  the 
likelihood  ratio  turns  out  to  be  a  sum  of  random  variables,  as  the  regenerative 
cycles  become  longer,  we  arc  summing  up  more  random  variables,  which  in  turn 
leads  to  more  variability. 

6.2  Balanced  and  Unbalanced  Systems 

Thr  next  model  we  experimented  with  is  a  large  computing  system,  whose  block 
diagram  is  shown  in  Figure  6.2.  This  model  is  also  discussed  in  (12]  and  [16].  We 
use  two  different  parameter  sets  to  create  a  “balanced"  and  an  "unbalanced" 
system.  In  order  for  a  system  to  be  considered  balanced  it  must  satisfy  two 
criteria.  First,  each  type  of  component  has  the  same  amount  or  redundancy, 
(i.c.  the  same  number  of  components  of  a  type  must  fail  in  order  for  the  system 
to  become  nonoperational,  e.g.  l-out-of-2  of  a  type  has  the  same  redundancy 
as  3-oul-of-d  of  another  type).  Also,  the  failure  rates  of  all  of  the  components 
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Figure  ):  A  block  diagram  of  the  computing  system  modeled. 

must  be  of  the  same  order  of  magnitude.  A  system  that  is  not  balanced  is  called 
unbalanced. 

For  a  balanced  system  we  select  two  sets  or  processors  with  two  processors 
per  set;  two  sets  of  controllers  with  two  controllers  per  set,  and  six  dusters  of 
disks,  each  consisting  of  four  disk  units.  In  a  disk  cluster,  data  is  replicated  so 
that  one  disk  can  fail  without  affecting  the  system.  The  “primary"  data  on  a 
disk  is  replicated  such  that  one  third  is  on  each  of.  the  other  three  disks  in  the 
same  cluster.  Thus  one  disk  in  each  cluster  can  be  inaccessible  without  losing 
access  to  the  data.  The  connectivity  of  the  system  is  shown  in  Figure  6.2.  We 
assume  that  when  a  processor  of  a  given  type  fails,  it  has  a  0.01  probability  of 
causing  the  operating  processor  of  the  other  type  to  fail.  Each  unit  in  the  system 
has  two  failure  modes  which  occur  with  equal  probability.  The  failure  rates  of 
the  processors,  controllers,  and  disks  are  assumed  to  be  1/2000, 1/2000, 1/6000 
per  hour,  respectively.  The  repair  rates  for  all  mode  1  and  all  mode  2  failures  are 
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1  per  hour  and  1/2  per  hour,  respectively.  Components  are  repaired  by  a  single 
repairman  who  chooses  components  ai  random  From  the  set  of  failed  units.  The 
system  is  defined  to  be  operational  if  all  data  is  inaccessible  to  both  processor 
types,  which  means  that  at  least  one  processor  of  each  type,  one  controller  in 
each  set,  and  3  out  of  I  disk  units  in  each  of  the  six  disk  cluster  are  operational. 
We  also  nssumn  that  operational  components  continue  to  fail  at  the  given  rales 
when  (he  system  is  failed. 

We  make  minor  changes  to  the  above  parameters'  settings  in  order  to  rreate 
an  unbalanced  system.  We  increase  the  number  of  processors  of  each  type  to  1, 
and  double  each  processor's  failure  rale  to  1/1000  per  hour.  We  decrease  the 
failure  rales  of  all  other  components  by  a  fnetor  of  ten.  In  this  system,  although 
a  processor  failure  is  more  likely  to  occur  in  a  failure  transition,  it  is  less  likely 
to  cause  a  system  failure  due  to  the  high  processor  redundancy.  This  is  typical 
behavior  for  an  unbalanced  system. 

6.2.1  Steady-State  measures 

In  this  section  we  discuss  the  results  of  our  experiments  for  estimating  the 
steady-state  unavailability  and  the  mean  time  to  failure  and  their  sensitivities 
with  respect  to  the  parameters  rrt  (failure  mode  2  repair  rale)  and  c//r  (disk 
controller  1  failure  rate).  These  two  parameters  were  selected  to  demonstrate 
that  we  can  estimate  the  sensitivities  with  the  largest  magnitude  with  about 
the  same  relative  accuracy  as  the  regular  estimates,  .and  the  sensitivities  of  s- 
mailer  magnitude  are  not  estimated  as  precisely,  as  shown  by  the  example  in 
Section  -1.3.2.  Numerical  (non-simulation)  results  for  these  measures  and  their 
sensitivities  were  obtained  using  the  SAVE  package  [1 1).  Since  the  balanced 
system  has  a  few  hundred  thousand  slates  and  the  unbalanced  system  has  close 
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to  a  million  stales,  only  bounds  could  be  computed  (16).  These  bounds  are  very 
light  and  typically  do  not  differ  from  the  exact  results  significantly.  We  simulate 
both  the  balanced  and  the  unbalanced  system*.  The  goal  of  the  simulation  ex¬ 
periments  is  to  demonstrate  that  we  can  obtain  estimates  of  certain  sensitivities 
that  have  approximately  the  same  relative  error  as  the  regular  estimate.  Also, 
we  sec  that  the  various  variance  reduction  techniques  have  the  same  effect  on 
the  sensitivity  estimates  as  they  do  on  the  regular  estimates.  Significant  vari¬ 
ance  reductions  can  be  obtained  using  the  Bia*l/Bia»t  method  for  the  balanced 
systems  ami  Bta*1 /Balancing  method  for  the  unbalanced  systems,  as  is  shown 
in  (12).  These  results  hold  for  both  the  regular  estimates  and  the  sensitivities. 

Tables  2  and  3  show  the  results  obtained  for  the  balanced  and  the  unbalanced 
systems,  respectively.  We  ran  the.  simulation  long  enough  so  that  the  smallest 
entry  in  the  tables  for  the  percentage  relative  half-widths  of  the  90%  confidence 
intervals  was  less  than  5%.  The  percentage  relative  half-width  of  a  confidence 
interval  is  defined  to  be  100%  times  the  confidence  interval  haU-width  divided  by 
the  point  estimate.  This  corresponds  to  approximately  100,000  events  for  earb 
entry  in  Table  2  and  1,000,000  events  Tor  each  entry  in  Table  3,  respectively. 
Based  on  empirical  results  obtained  in  (10),  the  values  for  Haul  =  0.5  and 
biaal  ss  0.5  were  selected  for  DIS. 

There  are  a  few  important  points  to  note  in  the  tables.  For  the  balanced 
system,  we  used  the  Bia»l/Bia»t  method,  and  Bin*!/ Balancing  is  used  for  the 
unbalanced  system.  As  is  shown  in  (10),  these  methods  are  most  effective  for 
their  respective  models  when  estimating  the  regular  (non-gradient)  performance 
measures.  We  can  see  that  this  is  also  the  case  for  the  sensitivities  since  we 
obtain  estimates  of  the  largest  sensitivities  that  are  about  as  accurate  as  the 
regular  estimate. 
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The  relative  precision  of  the  regular  estimates  and  of  each  of  their  respective 
sensitivities  with  respect  to  rrf  are  approximately  equal,  which  agrees  with  the 
analytic  results  we  obtained  from  the  simple  examples  in  Section  1.3.1  Also,  as 
claimed  in  Section  1.3.2,  we  do  not  obtain  as  accurate  estimates  for  the  sensi¬ 
tivity  with  respect  to  cljr  since  it  is  of  smaller  magnitude.  It  is  also  interesting 
to  note  that  the  amount  of  improvement  from  importance  sampling  over  direct 
simulation  in  the  sensitivities  is  about  the  same  as  the  improvement  in  the  reg¬ 
ular  estimates.  This  is  because  the  snmc  likelihood  ratio  needed  for  importance 
sampling  is  used  in  both  the  regular  estimate  and  the  sensitivities,  and  the  like¬ 
lihood  ratio  in  both  cases  is  multiplied  by  the  accumulators  at  the  end  of  each 
cycle. 

Also  note  that  the  sensitivity  estimates  with  respect  to  e.ljr  in  the  unbalanced 
system  using  direct  simulation  given  in  Table  3  are  very  poor.  This  is  because  the 
value  of  c//r  is  much  smaller  than  the  value  of  parameter  pror.fr ,  the  processors’ 
failure  rale,  and  so  events  corresponding  to  failures  of  disk  controller  I  arc 
somewhat  rare  compared  to  failures  of  one  of  the  processors.  Therefore,  we  are 
not  able  to  obtain  accurate  results  for  both  the  point  estimate  and  the  variance 
of  the  sensitivity  with  respect  to  c//r.  However,  when  using  Bia»  I  /Balancing, 
we  are  able  to  obtain  much  belter  estimates  of  these  quantities. 

We  next  performed  coverage  experiments  (see  c.g.,  |t8])  to  determine  the 
validity  of  the  confidence  intervals  that  arc  formed  based  on  the  asymptotic 
central  limit  theorems  described  in  Section  4.  Such  studies  are  important  since 
certain  variance  reduction  techniques  sometimes  do  not  produce  valid  confidence 
intervals,  except  for  very  long  run-lengths  (see  c.g.,  (18]). 

We  performed  experiments  on  estimates  of  the  steady-stale  unavailability, 
U ,  and  its  sensitivities  with  respect  to  both  rrt  and  e//r,  denoted  by  f/,,j  and 
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Ue\jts  respectively,  in  the  above  described  balanced  system  as  follow  s.  We  chose 
three  run  lengths  corresponding  to  small,  medium  and  large  sample  sizes,  and  we 
considered  two  ways  of  estimating  II  and  its  sensitivities:  standard  simulation 
and  the  Bia*l/Biaa2  method  with  DIS.  For  eadi  method  and  run  length  we 
ran  R  =  100  replications  and  formed  point  estimates  {/),...,  f)n  of  the  regular 
estimate  and  for  0  =  rr2  and  ct/r,  of  the  sensitivity  estimates, 

and  00%  confidence  intervals  for  all  of  these  estimates.  We  then  calculated  the 
mean  percent  relative  bias  (=  100%-  (l/^)I2!Li(^f  “  bf)/(/  for  the  steady- 
stale  unavailability  estimator,  and  likewise  for  the  sensitivity  estimators)  ami 
the  standard  deviation  of  this  mean.  Note  that  it  an  rstimatc  is  unbiased,  then 
its  mean  percent  relative  bias  should  converge  to  zero  as  R  — ►  oo.  We  also 
calculated  the  00%  coverages,  which  is  the  percentage  of  the  (computed)  00% 
confidence  intervals  that  actually  contain  the  true  values  of  f/,  Urrj,  and 
respectively.  If  the  confidence  interval  is  valid,  then  by  definition,  the  90% 
coverage  should  be  equal  to  90%. 

We  also  computed  the  mean  percent  relative  half  width  of  the  90%  confidence 
intervals.  For  each  replication,  this  relative  mine  is  computed  using  the  point 
estimate  and  not  the  true,  value.  The  mean  is  computed  over  all  replications 
with  a  nonzero  point  estimate.  The  results  are  listed  in  Table  -1.  Note  that, 
as  also  seen  in  (12],  the  estimates  using  direct  simulation  arc  significantly  more 
biased  than  those  using  importance  sampling,  and  that  its  confidence  intervals 
are  about  an  order  of  magnitude  wider.  Also  note  that  the  values  of  the  relative 
bias  and  relative  half  widths  for  the  sensitivities  with  respect  to  rrt  are  about 
the  same  as  those  for  the  regular  estimate,  while  these  values  for  the  other  sen¬ 
sitivity  are  generally  worse.  This  agrees  with  the  results  given  in  Section  -1.3.2. 
Furthermore,  for  the  small  run  length,  the  coverage  drops  significantly  below 


32 


90%  when  using  direct  simulation.  Using  our  variance  reduction  technique,  all 
the  coverages  arc  rinse  to  the  nominal  00%  value. 

The  good  behavior  of  the  regenerative-based  steady-state  gradient  estimates 
described  here  can  be  expected  to  typirally  hold  for  the  types  of  models  gen¬ 
erated  by  the  SAVE  package.  Because  the  failure  rates  are  usually  orders  of 
magnitude  smaller  than  the  repair  rates,  regenerative  cycles  tend  to  be  short, 
with  a  typical  cycle  consisting  of  one.  failure  transit  ion  and  one  repair  transition. 
Even  when  using  importance  sampling,  regenerative  cycles  typically  consist  of 
only  a  few  failure  and  repair  transition  since  we  turn  ofT  failure  biasing  once  a 
system  failure  occurs  in  a  cycle.  As  a  consequence  we  found  it  unnecessary  to 
implement  alternative  variance  reduction  techniques  to  be  used  for  steady-stale 
regenerative  gradient  estimation  in  the  SAVE  package. 

6.2.2  Transient  Measures 

In  this  sort  ion  we  discuss  the  results  of  our  experiments  for  estimating  reliabilily 
and  its  sensitivity  with  respect  to  both  rrt  and  r.IJr.  Recall  that  for  transient 
measures  wc  not  only  want  the  system  to  move  quickly  towards  the  set  of  system 
failed  stales  F,  but  also  get  there  before  the  observation  period  expires.  For 
Markov  chain  simulations,  these  issues  are  (in  some  sense)  orthogonal,  since  the 
holding  limes  that  determine  the  hitting  time  are  conditionally  independent  of 
the  embedded  DTMC  that  is  biased  towards  hitting  F.  We  therefore  use  the 
same  technique  as  in  the  steady-slate  case  to  bias  the  embedded  Markov  chain 
towards  the  system  failed  set,  in  addition  to  another  independent  technique 
(c.g.,  forcing  as  discussed  in  Section  3.1)  to  reduce  the  variance  due  to  holding 
times  in  the  various  stales.  The  likelihood  ratios  corresponding  to  these  two 
aspects  of  simulation  arc  conditionally  independent  and  can  be  formulated  as 
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in  Section  3.1  and  in  (12].  The  goal  of  the  simulation  in  to  study  the  relative 
accuracy  of  the  regular  estimate  versus  its  sensitivities  and  to  compare  the  effects 
of  the  forcing  technique  on  these  quantities.  We  considered  only  the  balanced 
system.  For  each  measure,  we  allowed  each  method  to  run  for  400,000  events. 
The  results  are  given  in  Table  5. 

For  all  methods,  we  notice  that  the  confidence  intervals  arc  smaller  Tor  some 
range  of  intermediate  time  periods  and  wider  at  the.  ends.  Also,  the  three 
tables  indicate  that  forcing  is  most  effective  for  short  time  intervals.  These 
characteristics  are  discussed  in  (12], 

It  is  interesting  to  note  that  the  relative  accuracy  of  the  sensitivity  estimates 
with  respect  to  rrt  are  consistently  slightly  worse  than  that  of  the  regular  esti¬ 
mate,  which  strays  from  the.  result  that  we  obtained  Tor  the  steady-state  mea¬ 
sures.  This  is  because  we  are  working  with  transient  measures.  The  likelihood 
ratio  therefore  includes  terms  for  the  (random)  holding  times.  Thus,  when  wc 
compute  the  gradient  of  the.  likelihood  ratio,  we  arc  including  additional  random 
variables  corresponding  to  the  holding  times  in  the  sum,  thereby  increasing  vari¬ 
ability.  It  is  also  interesting  to  note  that  the  relative  accurary  of  the  sensitivity 
estimates  degrades  compared  to  that  of  the  regular  estimate  as  the  time  horizon 
increases.  This  is  because  the  length  of  each  observation  increases  as  the  time 
horizon  increases,  thus  increasing  the  number  of  random  variables  included  in 
the  sum  for  the  gradient  of  the  likelihood  ratio,  thereby  increasing  the  variance. 
This  is  similar  to  the  results  from  the  a-component  parallel  system. 

•.2.1  Timing  Experiments 

Finally,  Table  G  shows  the  results  from  some  timing  experiments  which  we  per¬ 
formed  in  order  to  determine  how  much  extra  CPU  time  is  required  to  compute 


sensitivities.  Tlic  experiments  consisted  of  different  simulation  runs  in  which  we 
varied  the  number  of  sensitivities  computed  and  recorded  the  amount  of  CPU 
time  taken  in  each  run.  All  of  the  experiments  were  carried  out  on  an  IBM 
3090  computer  using  the  SAVE  package,  simulating  the  balanced  system  with 
the  iionf /hinni  (0.5/0.5)  technique  for  100,000  events.  As  one  can  see,  there 
is  a  fairly  large  fixed  cost  in  CPU  time  for  computing  any  gradients,  but  the 
marginal  cost  in  CPU  time  for  computing  each  additional  gradient  is  small.  It 
is  interesting  to  note,  that  the  additional  time  required  to  compute  eight  sen¬ 
sitivities  is  about  the  same  as  the  amount  of  lime  needed  to  run  SAVE  when 
computing  no  gradients. 

7  Summary  and  Directions  for  Future  Work 

In  this  paper  we  have  shown  that  the  likelihood  ratio  gradient  estimation  tech¬ 
nique  ran  he  an  effective  praclieal  tool  for  computing  parameter  sensitivities  in 
large  Markovian  models  of  highly  dependable  systems.  In  faet,  both  our  analysis 
and  our  computational  experience  suggests  that  the  gradient  estimates  consid¬ 
ered  here  are  not  significantly  noisier  than  the  estimates  of  the  performance 
measures  themselves.  In  addition  to  discussing  implementation  issues  that  arise 
in  calculating  and  computing  such  gradient  estimators,  we  also  show  that  the 
derivative  and  expectation  interchange  implicit  in  obtaining  the  validity  of  the 
estimators  docs  in  fact  hold  for  a  wide  class  of  performance  measures  associated 
with  finite-state  continuous-time  Markov  chains. 

A  number  of  interesting  research  directions  present  themselves  for  future 
work: 

1.  development  of  additional  variance  reduction  techniques  for  the  likelihood 
ratio  gradient  estimator; 
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2.  an  analytic  proof,  for  the  general  Markovian  model  of  a  highly  dependable 
system,  that  the  variability  of  the  gradient  estimator  is  roughly  of  the 
same  order  as  that  of  the  performance  mensure  itself  (thereby  extending 
the  results  of  this  paper  beyond  our  current  three  and  five  state  examples 
given  in  Section  -t.3). 

3.  extending  the  methods  of  this  paper  to  nott-Markovian  models,  in  which 
the  failure  and  repair  times  arc  no  longer  necessnrily  exponential.  This 
will  necessitate  the  development  of  efTtrient  rion-regenerative.  techniques 
for  estimating  steady-state  gradients  in  a  rare-cvent  setting. 
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8  Appendix 

Now  wc  will  justify  the  interchange  of  derivative  and  expectation.  In  order  to 
do  so,  wc  will  make  the  following  assumptions: 

Al.  Stale  space  E  is  finite. 

A2.  Q()  is  continuously  differentiable  for  0  €  0. 

AS.  /’(/?)  is  irreducible  for  0  6  0. 

A4.  r(fl)  =  {(r,  y) :  P(0,x,y)  >  0}  is  independent  of  0 ,  for  0  6  0. 

AS.  T  is  a  stopping  time  satisfying  the  following  two  conditions: 

1.  r«{T>  0}  =  1  for  all  0  €  0. 

2.  There  exists  some  Zn  >  0  for  which  the  moment  generating  function 

of  N(T)  converges  for  all  z  6  (-zo,zn)  and  all  0  €  0. 

A#.  7(0)  has  one  of  the  following  forms: 

1.  7,(0)  =  I5,  where  5  is  some  (measurable)  set  of  even: 

2.  7,(0)  =  J r0  0,  where  T  is  somr  stopping  time  satisfying  as¬ 

sumption  AS  and  /  is  a  real- valued  function  defined  on  (0,  E)  satis¬ 
fying  assumptions  A  7  and  A  8. 

A7.  ||/||  =  *up{|/(0,x)| :  0  €  0,x  €  F,)  <  00. 

A8.  ||/'||  =  sup{|/'(ff,x)| :  0  €  ©,x  6  E)  <  00. 

Note  that  under  assumptions  Al  and  A3,  our  CTMC  Y  is  a  regenerative 
process.  Also  note  that  when  wc  express  a  steady-state  measure  r(0)  using 
the  ratio  formula,  we  have  that  r(0)  is  a  ratio  of  two  expectations  of  random 
variables,  each  having  the  second  form  of  Z(0)  given  in  assumption  A6. 
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Also,  note  that  by  assumption  A  I,  we  can  let  F'  =  r(0),  i.c.  r(0)  is  indepen¬ 
dent  of  0. 

The  first  condition  of  assumption  AS  is  necessary  to  insure  the  validity  of 
the  ratio  formula.  Also,  there  arc  numerous  examples  of  stopping  times  which 
satisfy  assumption  AS. 

Proposition  1  Define  T  =  Jot  some  deterministic  time  t,  i.e.  T  it 

(hr  lime,  oj  (hr  Jirtt  trnntilion  nftrr  timr  t.  Snppote  attnmptiont  A I  and  AS 
hold.  Then,  T  talitjirt  ottnmplion  AS. 

Proof.  First,  define  7*  =  sup{i}(0, i)  :  0  6  0,  f  6  E]  and  A'*(<)  =  sup{n  > 
0  :  tj  +  •••  +  tj  <  t},  where  tj  is  an  exponentially  distributed  random  variable 
with  mean  1/7*  for  all  k.  By  assumptions  A I  and  A2,  7*  <  00.  Therefore. 
{A'*(t)  :  I  >  0}  is  a  Poisson  process  with  rate  7*.  Let  M*(z)  be  the  moment 
generating  function  of  A'*(t).  Then 

A /'(*)*  <00 

for  all  finite  l  and  t.  Hence,  N*(()  lias  a  convergent  moment  generating  function 
for  all  finite  t  and  z.  Now,  Pi{N(t)  >  *}  <  P.{A'*(f)  >  jfc}  for  all  0  €  0,  where 
/*.{•}  is  the  probability  measure  corresponding  to  {N‘(l)  :  l  >  0}  (see  [21]). 
Thus,  we  have  that  A'(t)  also  has  a  convergent  moment  generating  function  for 
all  finite  f  and  2.  | 

Proposition  2  Dejinr  T  =  I,  where  I  >  0  is  tome  deterministic  time.  Snppote 
attnmptiont  At  end  At  hold.  Then,  T  teiitjict  aitumption  AS. 

Proof.  Since  t  <  7'/v(,)+1,  the  result  follows  from  Proposition  1.  | 

Proposition  8  Dejinc  T  =  re*  =  inf{/  >  0  :  Ij.  ^  A,  )\  €  A]  Jot  tome  tel  oj 
states  A,  i.e.  T  it  (he  hitting  time  to  tome  tel  oj  states  A.  Snppote  atiumptiont 
At  -  AJ  hold.  Then,  T  totitfiet  assumption  AS. 


Proof.  Since  T  =  inf {#  :  $  A,)\  €  /!}  for  some  set  A,  we  have  that 

N(T)  =  inf{n  >  0  :  A*„  £  A).  Let  x  £  A.  By  assumptions  A3,  and  A4,  we  have 
that  tor  each  y  €  E,  there  exists  an  integer  m(y,x)  such  that  > 

0,  where  Pk(0)  is  the  k-slep  transition  matrix  of  the  embedded  DTMC  X.  Lei 
m  -  max{m(y,x) :  y  £  E,x  £  A),  which  is  finite  since  \E\  <  oo.  Now  we  have 
that  for  all  x  £  A, 

Pt{N(T)  >  »> | An  =  y}  <  /’,{ A'n(y,r)  *  *|.Ya  =  y} 

=  i  _  Pmb'r\0%y,x) 

<  />, 

where  p  -  snp{l  —  y,.r) :  0  £  ©,y  £  F,,x  €  A).  By  assumptions  At. 

A2,  and  A<1,  inf{r(0,r.y) :  0  £  O.(r.y)  €  T}  >  0,  and  so  p  <  1. 

Now  we  have 

P,{N[T)  >  mi.) 

=  E  WTO  >  »"(»  -  =  y}/VW  i  s  * |A*n  =  y} 

J*A 

<  t»E  )  >  n»(n  -  l).A‘m(m_i,  =  y} 

ftA 

=  pr${N(T)  >  m(n  -  1)}. 

By  induction,  we  obtain 

F»{N{T)>mn)<p\ 

So  we  have  the  moment  generating  function 

=  E«*"/WAf(n  =  »*} 

«*n 

oo  m 

=  P,{/V(7)  =  0}  4  Y,  E  e*("m+,,r,{Af(T)  =  nm  +  l) 

«=0  (si 

<  r,\N{T)  =  0}  +  Y,  h{N(T)  >  »m) 

«*0  |sl 
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For  z  <  0,  we  have  that 


M?iT\t)  <  /,»{A'(7‘)  =  0}  +  ^  mPt{N(T)  >  nm}c**m 

•— n 

<  />,{A'(r)  =  0}+mf;p'"]\ 

For  2  >  0,  we  have  that 

oo 

M?(r)(z)  <  /,#{A’(7-)  =  0}  +  ^m/>#{A’(7)>nm}c*("+l)m 

R=n 

oo 

<  /»,{A'(7  )  =  0)  4  me*" 

•=p 1 

lienee,  there  exists  some  zn  >  0  such  that  A/*{7>(*)  <  oc  Tor  all  z  6 
and  all  OgO,  | 

We  now  stale  a  lemma. 

Lemma  1  //  nznumption*  A I -AS  hold,  then 

EtZ(0)k  <  oo 

E,Z'{0)k  <  oo 

Jnr  all  k  and  all  6  €  0. 

Proof.  When  Z(0)  =  15,  (he  result  obviously  holds.  So  now  assume  Z(0)  = 
J0T  f(0,V.)d'-  Then  we  have  that  \Z{0)\  <  ||/||7’  and  \Z'{0)\  <  ||/'||7\  Now 
assumption  A5  implies  that  E$Tk  <  00  for  all  k  and  all  0  6  0,  which,  along 
with  assumptions  A7  and  A8,  gives  ns  our  result.  | 

Hence,  we  have  that  the  performance  measures  discussed  in  Section  2.3  sat¬ 
isfy  assumptions  A5-A8.  Now  we  will  justify  the  interchange  of  the  derivative 
and  expectation. 
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Theorem  1  //  assnmpfion*  .4/  -  AH  hold,  1hr.n 


-[/v#tZ(fl)f,(7\Mo)]#_#n  =  Zv#c7'(ffo)4  FJtnZ(0„)i'{T,0„,0n). 

Proof.  To  justify  the  interchange!,  we  will  show  that  (he  difference  quotients 
ir'[Z(On  4  h)l,(7\  On  4  Mn)  -  Z(0p)] 

are  dominated  by  an  inlegrahle  random  variahle.  By  the  mean  value  theorem, 
we  have  that  the  difference  quotient  is  equal  to 

^'(»,)M7,,q,fln)4^(q)/.'(7>b«l.). 

for  some  q  €  (0p,  do  4  /*). 

Define 


117*11  =  *«ip{|ij'(0,;r)| :  -  0,,|  <  A.x  €  Tv} 

II7V9II  =  *"P{l7*(0i*)/7(0.,-)l  t  \0  ~  H(,\  <  h,  r  £  E] 
ll7/7(^o)||  =  Mip{|7(0,r)/q(ffp.  r)| :  \°  ~  0n|  <  h,  z  €  Tv} 
Il7-9(fln)||  =  sup{|ij(0,z)  -  q[On,r)\ :  \0  -  0p|  <  h,T  6  Tv} 

\\P'/r\\  =  mv{r'(0,r,y)/r(0,,.,y):\a-l)o\<h,{T,y)eV} 
IIW*o)ll  =  *up{|r(<?,r,w)//»(fln,z,V)l « 1 0  -  «Q|  <  M*.»)  €  !’}, 


where  P  =  P(0).  By  assumptions  A  I,  A2,  and  A4  ,  all  of  these  terms  are  finite 
From  F/qualion  3.2,  we  have  that  f/(7\  q,0p)  is  equal  to 
r*<r)f _  _  \ 


Xj  I  J(Sl5j)  ”  *'(•»•  xj)lJ  +  J  -  7’(n.  A* tvcn+1  )(7’  -  7/v(T)) 

[nrii1  ss  .v.)  -  ijwsa] 

•  cxp{-(g(q,  A'/v(j->+i)  “  7(^o.  A'/v(T)+t))(7’-  )} ■ 


Now  wc  can  bound  |//(7\  rj,0n)|  by 


(f*[T)  +  i)(ll7'/7!(  +  r //’II)  +  ll7,IIEj'J,T)  tj  +  mr  -  w] 

•  Il7/7(^)r-(r)+1||r//'(dn)r<r’+’  cxp{||7  -  qm\ZklV  <*} 

•  exP{||7  _  7(^o)||(T  -  7V(r))}, 


whirlt  can  in  turn  be  bounded  by  where 


N(T)+i 


MV  =  (Af(7)  +  l)(|kV7ll  +  II^VHI)  +  117*11  £  t) 


(8.1) 


i= o 


,  N(T)+ I  . 

+,('>)  =  ikMfln)ir(r,+,ii/7/,(ffn)ir<T>+,«*p|iiff-«(«o)ii  £  4  • 

*  n  ' 


Note  I  hat  we  can  bound  \Z(0a)l,'(T, »/.  flo)|  by  ^(/<)  =  |2(0n)|^i(h)^j(/»).  So  we 
now  want  to  show  that  ^(h)  is  inlegrnble  for  /«  sufficiently  small.  To  do  this,  we 
will  show  that  there  exists  some  zu  >  0  such  that  M$[z),  which  wc  define  as  the 
moment  generating  function  of  4.  converges  for  all  z  €  (-*n,  *n)  and  all 

0  6  0. 

First  define  ijm  =  inf {q(0,r)  :0  €  ©,r  6  /?}.  Then 
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for  z  sufficiently  small  since  N(T)  has  a  convergent  moment  generating  function 
in  a  neighborhood  of  0  by  assumption  AS.  Since  M^T\z)  and  Af«(z)  both 
converge  in  a  neighborhood  of  0  for  nil  0  g  0,  we  have  that  N(T)  and  Y,ki I*  4 
have  finite  moments  of  all  orders  for  all  0  6  0. 

Now  note  that  by  assumption  A2.  ||<7/9(0fi)||  -*  1,  \\P/ r(0r>)||  -*  1,  and  U7- 
9(0o)||  — *  0  as  h  1  0.  Hence,  by  repeaLed  applications  of  the  Schwarz  inequality 
and  using  Lemma  I,  we  have  that  ^(/i„)  is  inlegiable  for  some  ho  >  0  which  is 
sufficiently  small.  Now  noting  (hat  for  0  <  h\  <  /»„,  we  have  <h(h\)  <  d>{h n),  and 
so  we  can  use  <l>(ho)  as  our  dominating  rnndom  variable  for  Z(0o)L'(T,r).0„). 
Thus,  we  have  shown  that  Z(0»)L'{T,  q,ff(,)  is  integrable.  Similarly,  we  can  show 
that  Z'(ri)l,(T,r),0o)  can  also  be  dominated  by  an  integrable  random  variable. 
Hence,  by  noting  that  L(T,  »j,0r.)  —  1  as  /1  J  0,  the  proof  is  complete.  | 

Now  we.  will  give  a  proof  of  the  asymptotic  variance  of  our  estimator  of  r'(/?n) 
given  in  Equation  In  order  to  do  this,  we  need  the  following  result  (see  [2 1], 
p.  118,  for  the  proof). 

Theorem  2  (Central  Limit  Theorem)  Let  X,-,f  =  1,2,...,  ■’  ■pcnde.n- 
(  and  identically  distributed  d-dime.nsional  rnndom  vector  with  mean  vector  // 
and  covariance  matrir  L,  and  avpjmse  g  :  R  is  differentiable  at  /».  // 

E||X,||J  <  00,  then 

\/n[.9(X«)  -  »(/»)]  =>  A'(0 ,<rJ) 

as  n  -*  oo,  where 


«r*  =  r9(,i)7  E?ff(„) 


and  Vjgt(-)  it  the  gradient  oj  g. 


Now  we  give  a  proof  of  (lie  expression  for  llie  asymptotic  variance  of  the 
estimate  of  the  gradient  when  using  the  ratio  formula. 

Theorem  S  if  assumptions  .4  /  -  Aft  hold,  then  the  asymptotic  variance  of  the 
estimate.  of  r'(t?0)  using  Equation  f.t  is  given  by  Equation  f.t. 

Proof.  We  define  the  vector  V  =  (/I,  B ,  C,  D ),  where 

a  =  z'T{0)  +  z(0)i;(r.0o,0o) 
n  =  r 

C  =  Zr(0) 

/)  =  TL'(T,0n,0c>) 

as  in  Equations  1.3-4. 0.  In  order  to  apply  Theorem  2,  we  first  need  to  show  that 
£»»ll^ll7  <  oo.  Assumption  A5  and  Lemma  1  show  that  T ,  Zj-{0)t  and  Z'T(tt) 
all  have  finite  moments  of  all  orders.  Now  note  that  |//(7”,  0n,0n)|  is  bounded 
by  4>\(h)  for  all  /i  >  0,  where  d>\(h)  is  defined  in  Equation  8.1  in  (he  proof  of 
Theorem  I.  Sinre  <f>\{h)  has  a  moment  generating  function  which  converges  for 
all  sufficiently  small  li,  we  have  that  L'(T,Oo,Of,)  also  has  finite  moments  of  all 
orders.  Hence,  by  repeated  applications  of  the  Sell  wars  inequality,  we  have  that 
A,  B,  C,  and  l)  all  have  finite  second  moments,  which  implies  Zi»n||V||J  <  oo. 
In  order  to  apply  Theorem  2,  we  define  g  :  R*  — *  3?  as 

g(o,t,r,rf)  =  a>>pCd-  (8.2) 

The  first  condition  of  assumption  A5  assures  that  g  is  differentiate  at  the 
point  (a,  /J,7,<).  Thus,  by  computing  the  gradient  of  g  and  plugging  in  the 
appropriate  values  into  the  expression  for  the  variance  given  in  Theorem  2,  the 
proof  is  complete.  | 
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Now  wc  show  dial  we  obtain  belter  estimates  of  the  gradients  when  we  use 
conditional  Monte  Carlo.  Before  we  prove  the  result,  we  make  some  definitions. 
We  define  the  vector  V  =  (/I,  H,C ,  D)  as  in  the  proof  of  Theorem  3,  and  wc  let 
W  =  E#,|1'|X]  =  (A,B,C,  /)),  where 

A  =  E,,{/l|X]  =  f7'((?0)  +  (7(/?ri)//(r0,(?n,f?n) 

B  =:  E*,|fl|X]  =  H(0n) 

C  =  E»,[C|X)  a*  G(0„) 

f)  =  E,„[/;|X]  =  ir{0o)  +  //((9r,)//(r0,  ff„A), 

where  G(0),  11(0),  G'(0),  W(0),  and  //'(r«,  flo.fln)  arc  defined  in  Equations  -1.8, 
4.9,  4.12,  '1.13.  and  4.M,  respectively.  Let  /i  =  E|„V  =  W.  Then,  by 
Theorem  2,  wc  have  that 

)-.<?(/')]  =>  Af(0,ir?) 


So  wc  have  that  <r\  and  n\  arc  the  variances  of  the  gradient  estimators  when 
using  the  ratio  formula  obtained  without  and  with  conditional  Monte  Carlo, 
respectively.  Then  wc  have  the  following  result. 
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Proposition  4  «rj  <  <r| . 

Proof.  By  noting  that 

Vff(//)r(W  -  ,/)  =  /v,„[Vff(,/)T(V  -  ,.)|XJ. 

we  have  the  result  by  the  principle  of  conditional  Monte  Carlo  (see  [3]).  I 
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Steady-State 

Unavailability 

Numerical 

Result 

Direct 

Simulation 

0.1000  x  10"’ 

0.0991  x  10-’ 

±  1.6% 

0.1000  x  io-J 

0.1011  x  10“’ 

±  4.1% 

0.1000  x  ur* 

0.0999  x  10"’ 

±  6.5% 

0.1000  x  io-s 

0.1007  x  10-’ 

±  6.5% 

0.1000  x  to-’ 

0.1010  x  10"’ 

±  8.1% 

0.1000  X  10-’ 

0.1020  X  10“’ 

±  8.3% 

Sensitivity 
w.r.t.  A 

Numerical 

Result 

Direct 

Simulation 

0.1954  x  10-’ 

0.1934  x  10-’ 

±  1.6% 

0.3593  x  10-’ 

0.3559  x  10“’ 

±  4.7% 

0.4901  x  10-’ 

0.4872  x  10"’ 

±  7.3% 

0.5859  x  10-’ 

0.5677  x  10-’ 
db  9.0% 

0.6457  x  10'’ 

0.6230  x  I0-’ 

±  10.4% 

ma 

0.6784  X  10-’ 

±  13.1% 

0.0229 

2.05 

0.0R8I 

2.63 

0.1233 

4.12 

0.1375 

7.53 

0.1426 

16.39 

0.1442 

43.89 

Tabic  I:  Estimates  of  steady-slate  unavailability  and  sensitivities  with  relative  90%  confidence  intervals  for  an 
n-componcnt  system  using  direct  simulation  (1,021, 000  events) 
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(a)  Results  for  regular  estimates 


Performance 

Measure 


Sensitivity  w.r.t.  rr2 


Numerical  Direct 

Result  Simulation 


Unavailability 


MTTF 


— .  1 256  x  10-' 

±  33.0% 

-.1205  x  10"4 
±  3.3% 

0.0*7!)  X  I046 
db  33.1% 

0.1109  x  104* 

±  2.6% 

(b)  Results  for  sensitivities  w.r.t.  rr2 


(c)  Results  for  sensitivities  w.r.t.  cl  fr 


Table  2:  Estimates  of  steady-stale  unavailability,  MTTF,  and  sensitivities  with 
relative  90%  confidence  intervals  for  the  balanced  system  (100,000  events) 


\ 
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(a)  Results  for  regular  estimates 


Performance 

Sensitivity  w.r.t.  rr2 

Measure 

Numerical 

Result 

Direct 

Simulation 

■iiraiiinjffiEri 

Unavailability 

-.9136  x  10" 7 

-.7939  x  10"7 
±  164.5% 

-.9384  x  10“7 
±  3.1% 

MTTF 

0.1181  x  10+' 

'  4555  x  J0+* 

±  164.5% 

0.1470  x  10+l 
±  2.4% 

(l>)  Results  for  sensitivities  w.r.t.  rr2 


Performance 

Measure 


Sensitivity  w.r.t.  cl  Jr 


Numerical  Direct 

Result  Simulation 


Unavailability 


MTTF 


0.2324  x  10”T  -.3425  x  JO-10  0.23(51  x  10-7 

±  1GG.8%  ±  6.0% 


-.7298  x  10+7  0.1104  x  10+s  -.7318  x  l6+7 

±  191.0%  ±  5.2% 


(c)  Results  for  sensitivities  w.r.t.  cl  Jr 


Table  3:  Estimates  of  steady-state  unavailability,  MTTF,  and  sensitivities  with 
relative  90%  confidence  intervals  for  the  unbalanced  system  (1,000,000  events) 
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(a)  Results  for  steady-state  unavailability 


_ Sensitivity  of  unavailability  w.r.t.  rr2 


Direct  ||  Biasl/Bias2 

Simulation 


Rel 

HW 

Coverage 

155.96% 

•16% 

82.30% 

81% 

25.27% 

94% 

0.5/0.5 


Rel  Coverage 


1 

23.26% 

1 - 

84% 

7.46% 

92% 

2.38% 

86% 

(b)  Results  for  sensitivity  of  unavailability  w.r.t.  rr2 


Sensitivity  of  unavailability  w.r.t.  cl  Jr 


Direct 


Simulation 


Rel 

HW 

Coverage 

432.36% 

11% 

147.02% 

62% 

56.79% 

86% 

IiMii 


KEEBl 


Coverage 


46.45% 

■  1 

90% 

14.86% 

82% 

4.68% 

95% 

(c)  Results  for  sensitivity  of  unavailability  w.r.t.  cl/r 


Te.ble  4:  Coverage  experiments  for  estimates  of  steady-state  unavailability  and 
sensitivities  on  the  balanced  system  (100  replications) 


53 


Unreliability 


Time  (t) 


Numerical 

Direct  Simulation 

Result 

Standard 

Forcin 

1  Standard 

Forcin 

0.1583  x  10" 4 
±  7.0% 

0.1522  x  10"4 
±  1.4% 

0.8693  x  10" 4 
±  3.3% 

0.8699  x  10"4 
±  1.3% 

0.3801  x  10"* 
±  1.8% 

0.3841  x  10"* 

±  1.1% 

0.1565  x  10"2 
±  1.5% 

0.1578  x  10"2 
±  1.0% 

0.6275  x  10"2 
±  4.9% 

0.6233  x  10"2 
±  4.3% 

0.8731  x  10" 


0.3804  X  J0- 


0.1552  x  JO"2 


0.G225  x  10- 


1.0721  x  tO-"* 
db  37.7% 


0.3552  x  10"* 
±  24.0% 


0.1463  x  10“2 
±  16.8% 


0.5598  x  10"2 
±  14.9% 


0.1481  x  10"4 
±  23.9% 


0.9428  x  10"4 
±  22.8% 


0.3421  x  10"* 
±  21.9% 


0.1578  x  10"2 
±  19.9% 


0.62t'4  x  10"2 
±  16.0% 


(a)  Results  for  unreliability 


Time  (t) 


_  Sensitivity  of  Unreliability  w.r.t.  rr2 


Numerical  Direct  Simulation _ 

Result  II  Standard  |  Fbreing  II  Standard 


Forcin 


4 

-0.4353  x  10"* 

-0.1778  x  10" 5 
±  117.0% 

-0.5126  x  10"* 

±  39.8 % 

— 0.4J  13  x  10"* 
±  13.4% 

-0.4280  x  10"* 

±  2.9% 

16 

-0.4886  x  10"4 

-0.6391  x  10"4 
±  59.8% 

-0.4799  x  10"4 
±  48.0% 

-0.4845  x  10"4 
±  6.1% 

-0.4853  x  10"4 
i  2.5% 

64 

-0.2455  x  10"* 

-0.2084  x  10"* 
±  46.7% 

-0.1679  x  10"* 

±  41.6% 

-0.2420  x  10"* 

±  3.6% 

-0.2467  x  10~* 

±  2.3% 

256 

-0.1031  x  10“2 

-0.0928  x  10"* 
±  35.0% 

-0.0877  x  10"2 
±  42.4% 

-0.1040  x  10"* 

±  4.0% 

-0.1055  x  10"2 
±  2.7% 

1024 

-0.4156  X  10  *2 

-0.5200  x  10-2 
±  33.4% 

-0.450 1  X  10"2 
±  40.8% 

-0.4220  X  10"2 
±  15.0% 

-0.3965  x  10-2 
±  12.6% 

(b)  Results  for  sensitivities  of  unreliability  w.r.t.  rr2 
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(c)  Results  for  scniiilivilies  of  unreliability  w.r.t.  cl  Jr 


Table  5:  Estimates  of  unreliability  and  sensitivities  with  relative  90%  confidence  intervals  for  the  balancrd  system 
(400,000  events) 
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Tahir  6:  CPU  times  lakrn  for  computing  sensitivities  in  balanced  system  using 
bia*1/bin*Z  (0.5/0. 5)  for  100,000  simulated  events 
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